Parallel Numerical Simulation of Seismic Waves Propagation with Intel Math Kernel Library
نویسندگان
چکیده
This paper describes the implementation of parallel computing to model seismic waves in heterogeneous media based on Laguerre transform with respect to time. The main advantages of the transform are a definite sign of the spatial part of the operator and its independence of the parameter of separation. This property allows one to efficiently organize parallel computations by means of decomposition of the computational domain with successive application of the additive Schwarz method. At each step of the Schwarz alternations, a system of linear algebraic equations in each subdomain is resolved independently of all the others. A proper choice of Domain Decomposition reduces the size of matrices and ensures the use of direct solvers, in particular, the ones based on LU decomposition. Thanks to the independence of the matrix of the parameter of Laguerre transform with respect to time, LU decomposition for each subdomain is done only once, saved in the memory and used afterwards for different right-hand sides. A software is being developed for a cluster using hybrid OpenMP and MPI parallelization. At each cluster node, a system of linear algebraic equations with different right-hand sides is solved by the direct sparse solver PARDISO from Intel Math Kernel Library (Intel MKL). The solver is extensively parallelized and optimized for the high performance on many core systems with shared memory. A high performance parallel algorithm to solve the problem has been developed. The algorithm scalability and efficiency is investigated. For a two-dimensional heterogeneous medium, describing a realistic geological structure, which is typical of the North Sea, the results of numerical modeling are presented.
منابع مشابه
Performance of FDM Simulation of Seismic Wave Propagation using the ppOpen-APPL/FDM Library on the Intel Xeon Phi Coprocessor
We evaluated the performance of a parallel 3D FDM simulation of seismic wave propagation using the Intel Xeon Phi coprocessor. We confirmed that MPI/OpenMP hybrid parallel computing with hyper-threading is more efficient than pure MPI parallelism. The performance of the thread parallel computing was further improved by fusing the original three DO loops of major kernel routines into two DO loop...
متن کاملIntel Cilk Plus for complex parallel algorithms: "Enormous Fast Fourier Transforms" (EFFT) library
In this paper we demonstrate the methodology for parallelizing the computation of large one-dimensional discrete fast Fourier transforms (DFFTs) on multi-core Intel Xeon processors. DFFTs based on the recursive Cooley-Tukey method have to control cache utilization, memory bandwidth and vector hardware usage, and at the same time scale across multiple threads or compute nodes. Our method builds ...
متن کاملFast recursive matrix multiplication for multi-core architectures
In this article, we present a fast algorithm for matrix multiplication optimized for recent multicore architectures. The implementation exploits different methodologies from parallel programming, like recursive decomposition, efficient low-level implementations of basic blocks, software prefetching, and task scheduling resulting in a multilevel algorithm with adaptive features. Measurements on ...
متن کاملEvaluation of DGEMM Implementation on Intel Xeon Phi Coprocessor
In this paper we will present a detailed study of implementing double-precision matrix-matrix multiplication (DGEMM) utilizing the Intel Xeon Phi Coprocessor. We discuss a DGEMM algorithm implementation running "natively" on the coprocessor, minimizing communication with the host CPU. We will run DGEMM across a range of matrix sizes natively as well using Intel Math Kernel Library. Our optimiza...
متن کاملReproducible, Accurately Rounded and Efficient BLAS
Numerical reproducibility failures rise in parallel computation because floating-point summation is non-associative. Massively parallel and optimized executions dynamically modify the floating-point operation order. Hence, numerical results may change from one run to another. We propose to ensure reproducibility by extending as far as possible the IEEE-754 correct rounding property to larger op...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012